Search for a command to run...
Woosuk and Simon from UC Berkeley discuss their open-source inference engine VLLM and their new company Inferact, which aims to build a universal infrastructure layer for running AI models efficiently across different hardware and model architectures.